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ABSTRACT 

We present a comparison between Gaussian processes (GPs) and artificial neural net- 
works (ANNs) as methods for determining photometric redshifts for galaxies, given 
training set data. In particular, we compare their degradation in performance as the 
training set size is degraded in ways which might be caused by the observational limi- 
1 tations of spectroscopy. Using publicly- available regression codes, we find that perfor- 

mance with large, complete training sets is very similar, although the ANN achieves 
slightly smaller root mean square errors. Training sets with brighter magnitude limits 
than the test data do not strongly affect the performance of either algorithm, un- 
. til the limits are so severe that they remove almost all of the high-redshift training 

^ ■ objects. Similarly, the introduction of a plausible number (up to 10%) of inaccurate 

CO \ redshifts into the training set has little effect on either method. However, if the size 

ON ■ of the training set is reduced by random sampling, the RMS errors of both methods 

CO ■ increase, but they do so to a lesser extent and in a much smoother manner for the case 

of GP regression; for the example presented ANNz has RMS errors ~ 20 % worse than 
■ GP regression in the small training-set limit. Also, when training objects are removed 

at redshifts 1.3 < z < 1.7, to simulate the effects of the "redshift desert" of optical 
ON . spectroscopy, the Gaussian process regression is successful at interpolating across the 

redshift gap, while the ANN suffers from strong bias for test objects in this redshift 
range. Overall, GP regression has attractive properties for photometric redshift esti- 
mation, particularly for deep, high-redshift surveys where it is difficult to obtain a 
/\ ' large, complete training set. At present, unlike the ANN code, public GP regression 

codes do not take account of inhomogeneous measurement errors on the photometric 
data, and thus cannot estimate reliable uncertainties on the predicted redshifts. How- 
ever, a better treatment of errors is in principle possible, and the promising results in 
this paper suggest that such improved GP algorithms should be pursued. 

Key words: galaxies: distances and redshifts - surveys - methods:data analysis 



1 INTRODUCTION 

1.1 Photometric redshift estimation 

The idea of estimating galaxy redshi f ts fro m broad-band 
photometry was first applied bv lBauml (^1962) to obtain red- 
shifts for suspected members of a cluster which were too 
faint for spectroscopy using the instruments of the time. 
Baum determined the redshifts by first measuring photome- 
try of a set ellipticals in known redshift clusters, to create a 

* E-mail: d.bonfield@herts.ac.uk (DGB) 



coarsely-sampled template spectral energy distribution, and 
then shifting this in logarithmic wavelength space (as if red- 
shifted) until it matched the photometry of the unknown 
objects. 

Since then it has become possible to fit to a wide vari- 
ety of higher resolut ion templates, based on empirical (e.g. 
Cole man et al.1 ll98Qh or synthetic (e.g. iBruzual &; Charlotl 
2003) galaxy spectra. Model photometry is obtained by di- 
rectly multiplying a redshifted template spectrum by the 
measured response curves for each photometric band, and 
is compared with data to determine the best-fitting red- 
shift. Template- fitting methods are very widely used (e.g. 
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iBolzonella et aLlbood : Bemtez 2000), since they can be ap- 
plied to any photometric dataset provided that the instru- 
ment response curves are known and the templates appro- 
priate for the galaxy being studied. However, there are often 
systematic uncertainties in both of these; filter curves, de- 
tector responses and atmospheric transmission are in general 
not precisely known, and spectral templates are necessarily 
composed of or calibrated against low-redshift galaxies, and 
become less reliable as redshift increases. 

Some of these problems with template fitting can be 
mitigated by using known redshifts f or calibration (e.g. 
Illbert et al.ll2006l : iFeldmann et alJl200d ). However, where a 
representative subset of the objects have precisely known 
redshifts, it is much simpler to use these objects to directly 
train some function so that it returns redshift as a function 
of photometric data, and then apply this function to the re- 
mainder of the catalogue. Empirical training-set methods of 
this kind have the additional advantage that it is straight- 
forward to include information from additional catalogued 
parameters such as angular size or surface brightness profile. 

Empirical photometric redshift methods a re currently 
dominated by artificial ne ural networks (ANNs; iFirth et al.l 
120031 : IVanzella et aI.ll2QQ4h . ANNs can perform general non- 
linear mappings (e.g. from photometric data to redshift) by 
the use of multiple layers of linear combinations of data (op- 
tionally with non-linear transfer functions on some of the 
nodes), with the weights on these combinations being free 
parameters which are modified during training. Simple feed- 
forward networks (in the sense that values propagate in one 
direction only, e.g. from photometry, via intermediate layers 
of linear combinations, to the estimated redshift) have the 
useful property, during training, that the errors on the red- 
shift estimates help inform the direction in which to modify 
the weights, and they can thus be trained quite efficiently. 

The most commonly-used implementation of ANNs for 
redshift estimation is th e publicly- available AN Nz package 
(jCollister fe Lahavll2004h . lAbdalla etaD <|2008bh show that 
ANNz matches or surpasses the performance of the best 
publicly- available template-fitting codes, when it is provided 
with a high quality training set. However, neural networks 
are far from the only tools able to perform non-linear regres- 
sion. One alternative, which has recently been favoured in 
the machine-learning community, is a class of models known 
as Gaussian processes. 



1.2 Gaussian process regression 

A Gaussian process (GP) is defined simply as a collection 
of random variables which have a joint Gaussian distribu- 
tion. The utility of GPs for regression is that they can 
be used as a prior distribution over a space of functions 
which are considered as po ssible models for the data (e.g. 
I Williams &; Rasmussenl 1 19961 ). The space of functions rep- 
resented by a Gaussian process is completely defined by a 
mean function (usually taken to be zero) and a covariance 
function, which for n training data xi...x n (each of which 
would be a vector of photometry in our case) and one test 
point x* (i.e. the photometry of a galaxy with unknown 
redshift) takes the form of an (n + 1) x (n + 1) matrix of 
covariances kij = fc(xi,Xj) between the data. 

Thus, the vector of "outputs" , which contains redshifts 



for the training data yi...y n and the redshift to be inferred 
for the test point y* , is assumed to be given by: 
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where A/"(0, C) is a joint Gaussian distribution with 
mean and covariance matrix C, and a n is a noise term 
added to the first n diagonal elements, both to account for 
noise in the data (assumed independent and identically dis- 
tributed) and to increase numerical stability. 

While this distribution appears extremely simple, it 
should be noted that there is considerable freedom to choose 
the covariance function fc(xi, x^), also commonly referred to 
as the kernel, which can be any positive definite function. 
This means we can achieve non- linear regression by using a 
non- linear kernel. We will discuss our specific choice of kernel 
later, but in general one chooses some function whose value 
indicates the similarity of datapoints x$ and x j 5 and which 
may have a number of hyperparameters whose optimisation 
forms part of the training process. 

If we define 
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then it can be shown fe.g. lRasmussen &; Willia ms 2006) that 
the posterior probability distribution of the predicted red- 
shift y* is a Gaussian distribution, with mean y* and vari- 
ance cr* given by 
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where A T denotes the transpose of A, A 1 denotes the in- 
verse of A, and I is the identity matrix. 



1.3 Previous work 

I Way &; Srivastaval (|2006l ) first evaluated Gaussian processes 
for photometric redshift estimation, and found that an en- 
semble of neural networks produced a slightly smaller RMS 
error. However, they used a smaller training set for the GP- 
based method due to computational limitations (the naive 
implementation of equations [2] and [3] requires the inversion 
of an n x n matrix, which involves 0(n 3 ) operations) so this 
was not a complete ly fair compa r ison of the methods. 

More recently, Fos ter et al.l (2009) have shown that it 
is possible to use rank-reduction techniques which allow in- 
ference using large training sets, with n objects, to be per- 
formed in 0(nm 2 ) operations (where m < n). Such methods 
make use of a full covariance matrix for a subset of only m 
objects (an m x m matrix), in conjunction with an m x n 
matrix of covariances between the full and partial training 
sets. 

With the ability to use information from a large train- 
ing set, of the sa me size as used by ANN-based methods, 
IWav et al.1 (|20Q9h find that GP regression yields a slightly 
lower RMS error than ANNz, when applied to galaxies 
drawn from the Sloan Digital Sky Survey fSDSS: IVork et all 
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2000). Way et al. also characterise the performance of GP 
regression using different kernel functions and choices for the 
size of the reduced rank m. 

These existing evaluations of GP regression for photo- 
metric redshift estimation have used an SDSS-type survey as 
a model. That is, they have assumed that a very large set of 
training objects is available, and have implicitly ensured that 
the training set is absolutely complete by selecting training 
and test objects in the same way from the SDSS spectro- 
scopic sample (i.e. those SDSS galaxies which have known 
spectroscopic redshifts). Use of SDSS data has also resulted 
in the exploration of a rather narrow range in redshift. 

In this paper, we extend this work by considering a num- 
ber of observationally-motivated restrictions on the training 
sets. In particular, we consider cases where the number of 
available spectra are small, but complete, and cases in which 
the training set is limited in magnitude in a different man- 
ner than the test data. We also examine the effect of reduc- 
ing the number training objects with spectral types lacking 
emission lines, or in the emission line "desert" at redshifts 
1.3 < z < 1.7, and compare the resistance of the methods 
to "bad" training objects with incorrectly determined red- 
shifts. We use photometry simulated to represent a deep, 
high-redshift survey. 



2 METHOD 
2.1 Algorithms 

For all tests of GP regression in this paper, we u se the stable, 
reduce d- rank GP regression code developed bv lFoster et al l 
(2009) and made publicly available as the stableGP pack- 
ag43 for MATLAB0. For simplicity, we consider only one 
of the possible GP algorithms available in this package, the 
"SR-VP" method, since it is considered to be the most nu- 
merically stable and accurate technique. We set the reduced 
rank m — 800, since the results of Way et al. show that the 
precision of the method is a weak function of m, and they 
find good performance with this rank size. The GP kernel 
used is the "neural network" covariance functi on. This func- 
tion is so named because it was shown by iNeall (1996|) to 
be equivalent to a feed-forward neural network with a single 
hidden layer, in the limit of an infinite number of hidden 
units. It can be written 

7 ( \ • -if xJ/3/x g \ 

/c nn (x p ,Xg) = asm (4) 

\V(i+x»(i + ^); 

where a and f3 are scalar hyperparameters to be optimised, 
and x is the x vector extended by appending an element 
with the value 1. 

We compare the performance of the GP regression 
method against that of the ANNz neural network code. In 
ANNz we use a network architecture with two hidden lay- 
ers, each with twice the number of nodes as we have inputs, 
which was shown to be effective for photometric data by 
ICollister &; Lahavl (|2004h and has become the default option 
for most users of ANNz. Since our simulated dataset has 7 

1 https : / /dashlink. arc. nasa. gov/algorit hm /st ablegp / 

2 http://www.mathworks.com/ 



photometric bands, this means a 7 : 14 : 14 : 1 network. We 
use precisely the same training data with ANNz and the GP 
method. However, ANNz uses an additional set of objects 
with known redshifts as a "validation" dataset, which helps 
prevent the network from becoming over-specialised on the 
objects in the training set. Over-specialisation, which means 
that the model fits the training data in such detail that it 
performs poorly on unseen data, is a particular problem with 
small training sets. We note that it is possible, in principle, 
to train ANNs without the use of a valid ation set by using 
a Bayesian framework (e.g. lMacKavl[l99lh . 

Our GP implementation is extremely simple, and in its 
current form does not make use of the estimated errors on 
the photometry of the training or test data. As such, it does 
not provide realistic estimates of the errors in derived red- 
shifts, which are provided by ANNz and can be used to 
define a "clean" sampl e of objects with re liable redshift es- 
timates as shown by lAbdalla et al.1 (|2008aT ). In the presen- 
tation of our ANN results, we show both the full set of test 
objects and a set "cleaned" by the removal of objects with 
predicted errors of a z > 0.3, so that the effectiveness of 
this procedure can be judged. In future work, we hope to 
provide reliable estimates of errors in quantities predicted 
by GP regression, and are exploring options such as monte- 
carlo realisations of training and/or test catalogues in order 
to more fully account for measurement errors. 



2.2 Simulated data 

We use a photometric dataset which was designed to sim- 
ulate the performance of a proposed configuration for the 
DUNE dark energy mission (since redesigned and incorpo- 
rated into the Euclid missioijj) , with infrared J and H filters 
plus a very broad RIZ filter, in combination with a ground- 
based optical survey with the g, r, i, and z filters, to depths 
proposed in an early DES (Dark Energy SurvejQ) config- 
uration. Tln^_d^ta^et is jjDart of a larger set of simulations 
used by lAbdalla et al.l (|2008ah to evaluate the effects of fil- 
ter selection and training set completeness on the accuracy 
of photometric redshifts calculated using ANNz. 

The catalogue that we use consists of 142803 objects, 
flux limited at RIZ < 25. We first split this into training, 
validation (used only with ANNz), and test datasets, with 
23638, 11949, and 107216 objects respectively. We refer to 
these training (validation) data as the "full training (valida- 
tion) set" . Throughout this work we use the complete set of 
test data, to represent the galaxies which would be observed 
photometrically in a potential future survey such as this one. 
To examine the possible effects of incomplete spectroscopic 
information (something which is difficult to avoid in imag- 
ing surveys which probe faint objects at high redshift), we 
restrict the training and validation sets in several ways. 

Our first test is to restrict the training set to brighter 
flux limits than the test data. This is motivated by the fact 
that, for a given telescope aperture, it is generally possible 
to image fainter objects than one can obtain spectra for in 
a reasonable amount of observing time. Placing magnitude 
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limits of RIZ < 23 and RIZ < 22 on the training and vali- 
dation sets results in training sets with 2721 and 858 objects, 
and validation sets with 1410 and 414 objects respectively. 

We also test the effects of training set incompleteness 
due to missing objects of a particular type or redshift range, 
since spectroscopic redshift surveys often have better com- 
pleteness for objects with easily detectable emission-lines. 
To do this we test a series of training sets with different 
fractions (20%, 40%, 60%, 80%, and 100%) of the early-type 
galaxies (i.e. those simulated using E or E+S0 templates) re- 
moved randomly, and similarly test the removal of galaxies 
(of all spectral types) in the 1.3 < z < 1.7 "redshift desert", 
where no strong emission lines are accessible to optical spec- 
trographs. 

Our third test is to randomise the redshifts of a fraction 
of training set objects. We do this because spectroscopic 
samples quite often contain a small fr action of objects with 
incor rectly determined redshifts (e.g. iFernandez-Soto et al] 
l200lh . 

The final test uses a "complete" training set, with the 
same magnitude limit and selection effects as the full train- 
ing set, but reduces the number of objects by drawing ob- 
jects at random from this set. We use randomly selected 
training sets containing 100, 200, 500, 1000, 2000, 5000, and 
10000 objects. For ANNz we also produce random valida- 
tion sets with half as many objects as the training sets. 



3 RESULTS AND DISCUSSION 

Figure [1] compares the photometric redshift estimates from 
GP regression and ANNz when trained using the full RIZ < 
25 training set. We show the estimated photometric redshift 
as a function of spectroscopic (true) redshift, using colour 
to indicate the density of points. For ANNz we also show 
the distribution obtained after removing all points with esti- 
mated uncertainties a z > 0.3 (in this case 30% of the points 
are removed) ; for some applications it is preferred to have an 
incomplete sample of galaxies such as this, with more reli- 
able and precise photometric redshifts, but for many applica- 
tions removal of such objects limits the scientific usefulness 
of the sample. For example the study of galaxy luminosity 
functions, evolution of galaxies with redshift and measure- 
ments of environmental density would be severely hampered 
by this kind of incompleteness. 

When using the full training set, the ANN-based 
method has a slightly tighter correlation, particularly at low 
redshift, but both methods perform quite well, with similar 
outliers. While this comparison is not especially discrimi- 
natory, it demonstrates that GPs, like ANNs, are able to 
cope well with a larger range in redshift than present in cur- 
rent large surveys such as the SDSS. Computationally, the 
methods are comparable even with the full training set, each 
taking a few minutes to train on a fairly standard worksta- 
tion; this would not be the case if a naive, full GP regression 
was used instead of the reduced-rank technique. With the 
reduced rank size m held constant, the calculation time for 
GP regression increases only linearly with training set size 
n. 

When the ANN results are "cleaned" by removing ob- 
jects with estimated redshift errors a z > 0.3 (panel (c)), 
many outliers are removed, improving the RMS error con- 



siderably (from 0.319 to 0.130 for objects in the range 0.5 < 
z < 1.0 and from 0.224 to 0.159 for redshifts 1.5 < z < 2.0; 
the RMS errors for the GP regression are 0.353 and 0.202 
in the same redshift intervals). 

Figures [2] and [3] show similar comparisons for the tests 
of magnitude-limited training sets, with RIZ < 23 and 
RIZ < 22 respectively. Perhaps surprisingly, the RIZ < 23 
training set yields slightly better performance than the full 
set at the lowest redshifts, especially for GP regression, re- 
ducing the number of outliers and increasing the density of 
points close to the y = x line. This is likely due to the re- 
moval of objects at z> 3, which can have similar colours to 
low redshift objects due to the redshifted Lyman break at 
~ 912 A mimicking the ~ 4000 A Balmer break, and thus 
cause ambiguity. In general, even with this rather conserva- 
tive magnitude limit we find that the performance of GPs 
and ANNs is similar, and, except for high-redshift objects, 
not a great deal worse than that with the full training set. 

It is only when we make the RIZ < 22 cut in the train- 
ing set, three magnitudes brighter than the limit of the test 
catalogue, that we find a very substiantial loss in perfor- 
mance. Although both methods perform poorly, it is worth 
noting some differences between the two techniques. The 
GP regression suffers from very considerable scatter, but 
has redshift estimates which are less biased on average than 
those from ANNz (i.e. the mean estimate lies closer to the 
true value) ; this is potentially a serious issue for those study- 
ing large-scale structure, as the systematic clumping of esti- 
mated redshifts can lead to f alse detections of overden sities 
(for a full discussion see, e.g.. Ivan Breukelen et al. 200^). It 
is also important to note that the "cleaned" sample from 
ANNz has roughly the same uneven distribution as the full 
set; this makes the point that reliability of the estimated 
redshift errors can also be affected by incompleteness in the 
training set. 

Although the magnitude-limited training sets clearly al- 
ter the performance of the redshift estimation, the fact that 
a cut from RIZ < 25 to RIZ < 23 has a minimal effect sug- 
gests that the distribution of magnitudes of objects in the 
training set is not the crucial parameter. The absolute num- 
ber of training objects at the redshifts of interest appears to 
be more significant, i.e. the RIZ < 22 training set is very 
poor at high redshift because it contains, for example, only 
11 training objects with z > 1. The following results, show- 
ing the effect of reducing the training-set size by drawing 
a random sample of objects, demonstrate this effect more 
clearly. 

Figure [H shows how the RMS error in photometric 
redshift estimates (calculated over two redshift ranges - 
0.5 < z < 1 and 1.5 < z < 2) changes as a function of 
training set size, when the training sets all have the same 
redshift distribution as the full set. As discussed above, the 
performance in the limit of large training sets is very similar, 
with ANNs slightly better in the low-redshift bin and GPs 
slightly better at high-redshift. However, there is a clear dif- 
ference between the performance of the GP and ANN meth- 
ods for small training sets. While the RMS error increases 
for both methods with decreasing training-set size, there is 
a rather sharp transition for ANNz and a much more grad- 
ual decline in the case of GP regression. While there will 
clearly be difficulties, in practice, in assessing the accuracy 
of any photometric redshift method when only a small num- 
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Figure 1. Photometric redshifts recovered using (a) GP regression and (b) ANNz, trained with the full set of 23638 training objects 
(and an additional 11949 objects for validation with ANNz). Panel (c) shows the ANNz results "cleaned" by the removal of test objects 
with estimated redshift errors > 0.3. The colour scale indicates the number of test objects in a pixel, where pixels represent intervals of 
0.02 in redshift. 
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Figure 2. As Figure [T] but using only the 2721 training objects (and 1410 validation objects) with RIZ < 23. 
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Figure 3. As Figure [T] but using only the 858 training objects (and 414 validation objects) with RIZ < 22. 
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Figure 4. A comparison of the RMS error in photometric redshift 
estimates (defined for the objects in each range in spectroscopic 
redshift as ((z p hot — ^spec) 2 ) 1//2 ) as a function of training set size, 
when training and validation objects are randomly selected from 
a large, complete set of objects with known redshifts. 

ber of known redshifts is available, this smooth behaviour as 
a function of training-set size makes it appear more feasible 
to estimate GP regression accuracy (by, say, using half the 
known data for training and half for testing) than in the 
case of ANNs. 

We choose to quote RMS errors in particular redshift 
ranges because the overall RMS error is not especially in- 
formative; it is too strongly effected by performance at low 
redshift where the majority of simulated objects lie. Even so 
these RMS measures, while useful metrics, do not provide 
information about bias and other kinds of false structure in 
the redshift distributions, so for particular cases of interest 
we also show the detailed plots of photometric redshift ver- 
sus true redshift, as for the magnitude- limited training sets. 
Figure [5] shows the detailed performance for the case with 
500 training set objects (and, for the ANN method only, 250 
validation objects). It is clear from this plot that the results 
from ANNz are more biased than those from GP regression, 
as well as having larger RMS error. 

For comparison, we also attempted to calculate pho- 
tometric redshifts using the HyperZ template-fitting code 



Figure 6. As Figure 2] but showing the effect on RMS errors of 
randomly changing the redshifts of a subset of training objects. 



(Bof zonella et all feoOO), which does not require any train- 
ing data at all. We tried empirical and theoretical 
(|Bruzual Charlotl lioolj ) templates. While the theoretical 
templates were better than the empirical ones, they still 
performed less well than the GP regression even with only 
100 training objects, achieving RMS errors in the ranges 
0.5 < z < 1.0 and 1.5 < z < 2.0 of 0.794 and 0.390 respec- 
tively. 

We should note that one could choose a simpler neu- 
ral network architecture which, with fewer free parameters, 
would be easier to train with a smaller training set; in gen- 
eral, one can optimise the network architecture to suit the 
size of the training set. This could be achieved either by 
the use of extra validation data (which would, by definition, 
be in short supply in cases with few available redshifts), or 
model selectio n by optimising the Bayesian evidence (e.g. 
MacKay 1994). However, while possible in principle, and in- 
deed worth pursuing in practice, neural network architecture 
optimisation is not at present widely used for the calculation 
of photometric redshifts. 

Most of the other training-set limitations tested have 
small effects on the RMS errors, which are similar for both 
ANNs and GPs. Figure [6] shows that changing the redshifts 
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Figure 7. As Figure 2] but showing the effect on RMS errors of 
removing objects with early- type template spectra. 
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Figure 8. As Figure [H but showing the effect on RMS errors of 
removing objects with redshifts 1.3 < z < 1.7. 



of a subset of training objects (to a random, incorrect value) 
has little effect until the frac tion of objects modified is large. 
Fernandez - Soto et al. I (|200lh find a wrong redshift fraction 
of only ~ 6 % in their spectroscopic data, which appears to 
be too small to have an effect. Figure [3 shows that removing 
objects of a particular spectral type (in this case those with 
some fraction of an elliptical template) also has a negligible 
effect until all objects of that type are removed (a situation 
which is unlikely to occur in practice). 

The case of removing objects in the 1.3 < z < 1.7 "red- 
shift desert" is a little more interesting. Although Figure [8] 
shows very little effect on RMS errors for even quite severe 
reductions in the number of training objects in the redshift 
desert, Figure [9] shows that, in the extreme case where all 
objects at these redshifts are eliminated, the behaviour of 
GP and ANN regression is quite different; the GP seems to 
much more easily interpolate across the gap in redshift than 
the ANN. This is potentially a useful property when dealing 
with any training set with gaps in parameter space; in this 
case the neural network method causes a false concentration 
of test objects at the upper limit of the redshift desert, which 



would likely cause false positives in searches for clusters, for 
example. 

Overall, we see that off-the-shelf GP regression has both 
advantages and disadvantages compared to ANNz. ANNz 
makes full use of the uncertainty information for each dat- 
apoint, and thus produces reliable error estimates on the 
redshift, provided that the training set is reasonably large 
and representative. Its ability to use uncertainty information 
properly may also be responsible for its higher precision in 
the large training- set limit; the GP regression treats all data 
equally, even those with large errors on their photometry. On 
the other hand, the GP has far fewer hyperparameters to fit, 
and, perhaps because of this, appears to behave more stably 
in the case of sparse data. 

Since we know that GP and ANN regression have iden- 
tical mathematical properties (at least for special cases of 
both methods, as discussed above), we hope that it should be 
possible to combine the best properties of both methods ex- 
plored here. This might be achieved either by incorporating 
a treatment of inhomogeneous errors into the GP method, 
or by training ANNs differently, e.g. allowing freedom in the 
network architecture and selecting the best network based 
on Bayesian evidence rather than a validation set. Of these 
possibilities, the more-flexible ANN option is better devel- 
oped in the machine-learning community, but the addition 
of input err ors to GP regression ought to be feasible (as 
discussed bv lGirard fc Murray- Smithll2005h and may offer a 
simpler training option. 



4 CONCLUSIONS 

We have compared a simple implemen tation of GP regres- 
sion, based on the stableGP code of iFoster et al.l (|20Q9h , 
with the popular neural network code ANNz. 

With large, complete training sets the precision of 
the methods is very similar, with ANNz achieving slightly 
smaller RMS errors, particularly at redshifts z < 1. How- 
ever, we note that GP regression, unlike ANNz, does not 
require an additional set of validation data to preserve gen- 
erality, and so could take advantage of a larger training sam- 
ple. 

We investigate placing magnitude limits on the train- 
ing sets but find that the performance of neither algorithm 
is strongly affected, until the limits are so severe that they 
remove a large fraction of the training objects in the red- 
shift range of the test object. In this case the GP algorithm 
produces less biased results but with a larger scatter. Sim- 
ilarly, the introduction of a plausible number (up to 10%) 
of inaccurate redshifts into the training set, or the removal 
of objects of a particular spectral type, has little effect on 
either method. 

If the size of training set is instead reduced by ran- 
dom sampling, the RMS errors of both methods increase, 
but they do so to a lesser extent and in a much smoother 
manner for the case of GP regression. With this particu- 
lar dataset and network architecture we find that the ANN 
method deteriorates sharply with training sets smaller than 
about 2000 objects, where its RMS errors are ~ 20% larger 
than for GP regression, with considerably increased bias. 
While the GP results could in principle be reproduced by 
some appropriately selected ANN architecture, the compu- 
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Figure 9. As Figure ^ but with all objects at redshifts 1.3 < z < 1.7 removed from the training set. 



tational difficulty of selecting and then training such a net- 
work may make the GP method preferable. 

In addition, when training objects are removed at red- 
shifts 1.3 < z < 1.7, to simulate the effects of the "redshift 
desert" of optical spectroscopy, the GP regression is suc- 
cessful at interpolating across the redshift gap, while ANNz 
suffers from strong bias for test objects in this redshift range. 

These properties make GP regression a very attractive 
algorithm for the estimation of photometric redshifts, par- 
ticularly in the case where few training redshifts are avail- 
able at the redshift of interest. Since telescopes of any given 
aperture are in general able to image fainter objects than 
they can obtain spectra for, this is likely to be the situa- 
tion for the vast majority of both present and future deep, 
high-redshift surveys. 

At present, unlike the ANN code, public GP regression 
codes do not take account of inhomogeneous measurement 
errors on the photometric data, and thus cannot estimate 
reliable uncertainties on the predicted redshifts. Such un- 
certainty values can be used to form subsamples of objects 
with, on average, more accurate redshift estimates. While 
such subsamples have limited applications due to the com- 
plex selection effects involved, redshift uncertainty values 
can also be used (e.g. using a Monte Carlo method) to cal- 
culate more robust statistics about the whole population of 
galaxies. 

The advantages of GPs demonstrated in this paper sug- 
gest that such improved GP algorithms should be pursued, 
and we are exploring both Monte Carlo and analytic options 
for improving their treatment of errors. 
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